CBT Campus' Online Skills Training Courses.

IT Skills

Enterprise Database Systems

Data Science

Working with HDFS

it_dshdfsdj_04_enus

it_dshdfsdj_01_enus

it_dshdfsdj_02_enus

it_dshdfsdj_03_enus

Hadoop HDFS: File Permissions

Course Number:
it_dshdfsdj_04_enus

Expected Duration (hours)
0.8

Lesson Objectives

Hadoop HDFS: File Permissions

Course Overview
count the number of files and view their sizes on HDFS using the count and du commands
configure and view permissions for individual files and directories using the getfacl and chmod commands
define and set permissions for the entire contents of a directory with the chmod command
write a simple bash script
automate the transfer of all the files in a directory on your local file system over the HDFS with a shell script
identify the data and metrics available on the HDFS NameNode UI and work with its file system explorer
delete a Google Cloud Dataproc cluster and all of its associated resources
work with file-permissioning in HDFS and recognize the data and metrics available in the NameNode UI

Overview/Description

Explore reasons why not all users should have free reign over all data sets, when managing a data warehouse. In this 9-video Skillsoft Aspire course, learners explore how file permissions can be viewed and configured in HDFS (Hadoop File Management System) and how the NameNode UI is used to monitor and explore HDFS. For this course, you need a good understanding of Hadoop and HDFS, along with familiarity with the HDFS shells, and confidence in working with and manipulating files on HDFS, and exploring it from the command line. The course focuses on different ways to view permissions, which are linked to files and directories, and how these can be modified. Learners explore automating many tasks involving HDFS by simply scripting them, and to use HDFS NameNode UI to monitor the distributed file system, and explore its contents. Review distributed computing and big data. The closing exercise involves writing a command to be used on the HDFS dfs shell to count the number of files within a directory on HDFS, and to perform related tasks.

Target

Prerequisites: none

Hadoop HDFS: Introduction

Course Number:
it_dshdfsdj_01_enus

Expected Duration (hours)
1.2

Lesson Objectives

Hadoop HDFS: Introduction

Course Overview
recognize the need to process massive datasets at scale
describe the benefits of horizontal scaling for processing big data and the challenges of this approach
recall the features of a distributed cluster which address the challenges of horizontal scaling
identify the features of HDFS which enables large datasets to be distributed across a cluster
describe the simple and high-availability architectures of HDFS and the implementations for each of them
identify the role of Hadoop's MapReduce in processing chunks of big datasets in parallel
recognize the role of the YARN resource negotiator in enabling Map and Reduce operations to execute on a cluster
describe the steps involved in resource allocation and job execution for operations on a Hadoop cluster
recall how Apache Zookeeper enables the HDFS NameNode and YARN ResourceManager to run in high-availability mode
identify various technologies which integrate with Hadoop and simplify the task of big data processing
recognize the key features of distributed clusters, HDFS, and the input outs of the Map and Reduce phases

Overview/Description

Explore the concepts of analyzing large data sets in this 12-video Skillsoft Aspire course, which deals with Hadoop and its Hadoop Distributed File System (HDFS), which enables parallel processing of big data efficiently in a distributed cluster. The course assumes a conceptual understanding of Hadoop and its components; purely theoretical, it contains no labs, with just enough information provided to understand how Hadoop and HDFS allow processing big data in parallel. The course opens by explaining the ideas of vertical and horizontal scaling, then discusses functions served by Hadoop to horizontally scale data processing tasks. Learners explore functions of YARN, MapReduce, and HDFS, covering how HDFS keeps track of where all pieces of large files are distributed, replication of data, and how HDFS is used with Zookeeper: a tool maintained by the Apache Software Foundation and used to provide coordination and synchronization in distributed systems, along with other services related to distributed computing—a naming service, configuration management, and so on. Learn about Spark, a data analytics engine for distributed data processing.

Target

Prerequisites: none

Hadoop HDFS: Introduction to the Shell

Course Number:
it_dshdfsdj_02_enus

Expected Duration (hours)
0.9

Lesson Objectives

Hadoop HDFS: Introduction to the Shell

Course Overview
provision a Hadoop cluster on the cloud using the Google Cloud Platform's Dataproc service
identify the various GCP services used by Dataproc when provisioning a cluster
list the metrics available on the YARN Cluster Manager app and recognize how it can be useful to monitor job executions
recall the details and metrics of HDFS available on the NameNode web app and how it can be used to browse the file system
identify the tools of the Hadoop ecosystem which are packaged with Hadoop and recall how they can be accessed
configure HDFS using the hdfs-site.xml file and identify the properties which can be set from it
compare the hadoop fs and hdfs dfs shells and recognize their similarities to Linux shells
explore apps for Hadoop, configure HDFS, work with HDFS shells

Overview/Description

In this Skillsoft Aspire course, learners discover how to set up a Hadoop Cluster on the cloud and explore bundled web apps—the YARN Cluster Manager app and the HDFS (Hadoop Distributed File System) NameNode UI. This 9-video course assumes a good understanding of what Hadoop is, and how HDFS enables processing of big data in parallel by distributing large data sets across a cluster; learners should also be familiar with running commands from the Linux shell, with some fluency in basic Linux file system commands. The course opens by exploring two web applications which are packaged with Hadoop, the UI for the YARN cluster manager, and the node name UI for HDFS. Learners then explore two shells which can be used to work with HDFS, the Hadoop FS shell and Hadoop DFS shell. Next, you will explore basic commands which can be used to navigate HDFS; discuss their similarities with Linux file system commands; and discuss distributed computing. In a closing exercise, practice identifying web applications used to explore and also monitor Hadoop.

Target

Prerequisites: none

Hadoop HDFS: Working with Files

Course Number:
it_dshdfsdj_03_enus

Expected Duration (hours)
0.8

Lesson Objectives

Hadoop HDFS: Working with Files

Course Overview
identify the different ways to use the ls and mkdir commands to explore and create directories on HDFS
transfer files from your local file system to HDFS using the copyFromLocal command
copy files from your local file system to HDFS using the put command
transfer files from HDFS to your local file system using the copyToLocal command
use the get and getmerge functions to retrieve one or multiple files from HDFS
work with the appendToFile and rm commands on the hdfs dfs shell
utilize HDFS commands to work with and manipulate files using the HDFS shell

Overview/Description

In this Skillsoft Aspire course, learners will encounter basic Hadoop file system operations such as viewing the contents of directories and creating new ones. This 8-video course assumes good understanding of what Hadoop is, and how HDFS enables processing of big data in parallel by distributing large data sets across a cluster; learners should also be familiar with running commands from the Linux shell, with some fluency in basic Linux file system commands. Begin by working with files in various ways, including transferring files between a local file system and HDFS (Hadoop Distributed File System) and explore ways to create and delete files on HDFS. Then examine different ways to modify files on HDFS. After exploring the distributed computing concept, prepare to begin working with HDFS in a production setting. In the closing exercise, write a command to create a directory/data/products/files on HDFS, for which data/products may not exist; list two commands for two copy operations—one from local file system to HDFS, and another for reverse transfer, from HDFS to local host.

Target

Prerequisites: none